Goto

Collaborating Authors

 kernelized bayesian softmax


Kernelized Bayesian Softmax for Text Generation

Neural Information Processing Systems

Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space. Empirical studies show that KerBS significantly boosts the performance of several text generation tasks.


Reviews: Kernelized Bayesian Softmax for Text Generation

Neural Information Processing Systems

This paper builds on the motivation that context vectors from a language model, such as BERT, often cluster into separate groups for the same next word. These clusters may correspond to different senses of the word, and often have varying variances. The authors argue that a traditional softmax is not expressive enough to capture these clusters. A similar argument was made by Yang et al in their Mixture of Softmax (MoS) paper. The solution presented here is quite different though -- to allocate multiple senses to each word in the output embedding table, and to use a parameterized kernel to model the variance. The ideas are pretty neat, and as far as i know, original.


Reviews: Kernelized Bayesian Softmax for Text Generation

Neural Information Processing Systems

This work proposes an approach to accommodate multiple senses of words as different embeddings. Reviewers' positive assessment was influenced by the additional experiments and results provided in the authors' response, so authors are definitely expected to incorporate those (or results to the same effect) in the final version.


Kernelized Bayesian Softmax for Text Generation

Neural Information Processing Systems

Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space.


Kernelized Bayesian Softmax for Text Generation

Miao, Ning, Zhou, Hao, Zhao, Chengqi, Shi, Wenxian, Li, Lei

Neural Information Processing Systems

Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space. Empirical studies show that KerBS significantly boosts the performance of several text generation tasks.